A New Data Mining Algorithm based on MapReduce and Hadoop
نویسندگان
چکیده
The goal of data mining is to discover hidden useful information in large databases. Mining frequent patterns from transaction databases is an important problem in data mining. As the database size increases, the computation time and required memory also increase. Base on this, we use the MapReduce programming mode which has parallel processing ability to analysis the large-scale network. All the experiments were taken under hadoop, deployed on a cluster which consists of commodity servers. Through empirical evaluations in various simulation conditions, the proposed algorithms are shown to deliver excellent performance with respect to scalability and execution time.
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملA Survey on MapReduce Performance and Hadoop Acceleration
MapReduce is implementation for generating large data sets with a parallel, distributed algorithm on a cluster. Hadoop is open source implementation of the MapReduce programming datamodel used for large-scale parallel applications such as web indexing, data mining, and scientific simulation. Hadoop-A framework is able to levitate Hadoop acceleration and give significant performance compared to ...
متن کاملImproving Efficiency and Time Complexity of Big Data Mining using Apache Hadoop with HBase storage model
Data Mining is the science of mining the knowledge from the raw data and applying to improvement of the industrial rules. Now for the mining of “ big data “ we required new approach new algorithm and new techniques and analytics to mining the knowledge from it. Day by day a huge amount of data is generated and the usage is expanding .The term BIGDATA is a popular term which used to describe the...
متن کاملWeighted Itemset Mining from Bigdata using Hadoop
Data items have been extracted using an empirical data mining technique called frequent itemset mining. In majority of theapplication contexts items are enriched with weights. Pushing an item weights into the itemset extraction process, i.e., mining weighted itemsets rather than traditional itemsets, is an appealing research direction. Although many efficient weighteditemset mining algorithms a...
متن کاملPerformance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014